AITopics | functional gradient descent

Collaborating Authors

functional gradient descent

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Regularized Gradient Boosting

Corinna Cortes, Mehryar Mohri, Dmitry Storcheus

Neural Information Processing SystemsFeb-12-2026, 01:31:05 GMT

Neural Information Processing Systems http://nips.cc/

algorithm, complexity, regularization, (10 more...)

Neural Information Processing Systems

Country: North America > Canada (0.04)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.67)

Add feedback

Sinkhorn Barycenter via Functional Gradient Descent

Neural Information Processing SystemsDec-23-2025, 18:05:49 GMT

In this paper, we consider the problem of computing the barycenter of a set of probability distributions under the Sinkhorn divergence. This problem has recently found applications across various domains, including graphics, learning, and vision, as it provides a meaningful mechanism to aggregate knowledge. Unlike previous approaches which directly operate in the space of probability measures, we recast the Sinkhorn barycenter problem as an instance of unconstrained functional optimization and develop a novel functional gradient descent method named \texttt{Sinkhorn Descent} (\texttt{SD}). We prove that \texttt{SD} converges to a stationary point at a sublinear rate, and under reasonable assumptions, we further show that it asymptotically finds a global minimizer of the Sinkhorn barycenter problem. Moreover, by providing a mean-field analysis, we show that \texttt{SD} preserves the {weak convergence} of empirical measures. Importantly, the computational complexity of \texttt{SD} scales linearly in the dimension $d$ and we demonstrate its scalability by solving a $100$-dimensional Sinkhorn barycenter problem.

functional gradient descent, sinkhorn barycenter, sinkhorn barycenter problem, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.44)

Add feedback

Regularized Gradient Boosting

Corinna Cortes, Mehryar Mohri, Dmitry Storcheus

Neural Information Processing SystemsOct-2-2025, 15:46:54 GMT

Neural Information Processing Systems http://nips.cc/

algorithm, artificial intelligence, machine learning, (13 more...)

Neural Information Processing Systems

Country: North America (0.28)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.68)

Add feedback

Nonparametric Teaching for Graph Property Learners

Zhang, Chen, Bu, Weixin, Ren, Zeyi, Liu, Zhengwu, Wu, Yik-Chung, Wong, Ngai

arXiv.org Artificial IntelligenceMay-22-2025

Inferring properties of graph-structured data, e.g., the solubility of molecules, essentially involves learning the implicit mapping from graphs to their properties. This learning process is often costly for graph property learners like Graph Convolutional Networks (GCNs). To address this, we propose a paradigm called Graph Neural Teaching (GraNT) that reinterprets the learning process through a novel nonparametric teaching perspective. Specifically, the latter offers a theoretical framework for teaching implicitly defined (i.e., nonparametric) mappings via example selection. Such an implicit mapping is realized by a dense set of graph-property pairs, with the GraNT teacher selecting a subset of them to promote faster convergence in GCN training. By analytically examining the impact of graph structure on parameter-based gradient descent during training, and recasting the evolution of GCNs--shaped by parameter updates--through functional gradient descent in nonparametric teaching, we show for the first time that teaching graph property learners (i.e., GCNs) is consistent with teaching structure-aware nonparametric learners. These new findings readily commit GraNT to enhancing learning efficiency of the graph property learner, showing significant reductions in training time for graph-level regression (-36.62%), graph-level classification (-38.19%), node-level regression (-30.97%) and node-level classification (-47.30%), all while maintaining its generalization performance.

artificial intelligence, graph, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2505.1417

Genre: Research Report > New Finding (0.48)

Industry:

Information Technology (0.67)
Health & Medicine > Pharmaceuticals & Biotechnology (0.67)
Health & Medicine > Therapeutic Area (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Review for NeurIPS paper: Sinkhorn Barycenter via Functional Gradient Descent

Neural Information Processing SystemsJan-21-2025, 10:31:11 GMT

Weaknesses: The constants in the bounds depend linearly on the dimension, although they depends exponentially on the regularization parameter. If Sinkhorn distance is thought as a proxy of the Wasserstein distance, this seems to be a hidden dependance on the dimension, since the regularization parameter plays the role of an interpolation between MMD and Wasserstein distances, and MMD distances are more blind to the dimension. This is not discussed in the paper. The results also have an exponential dependence on an assumed uniform upper bound on the cost. For the classical quadratic cost, this imply an exponential dependence on the dimension for the case of measures supported on [0,1] d for instance.

barycenter, functional gradient descent, regularization parameter, (9 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.44)

Add feedback

Review for NeurIPS paper: Sinkhorn Barycenter via Functional Gradient Descent

Neural Information Processing SystemsJan-21-2025, 10:31:04 GMT

This paper proposes a new method to compute the (Sinkhorn) of barycenter of several probability measures. In practice, the method scales well computationally and in high-dimension and the authors provide some theoretical support. Reviewers agree that this paper is strong with only minor weaknesses (such as the exponential dependency in 1/gamma; which in related work is often suboptimal). I thus recommend accept (poster).

functional gradient descent, neurips paper, sinkhorn barycenter

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.40)

Add feedback

Sinkhorn Barycenter via Functional Gradient Descent

Neural Information Processing SystemsOct-9-2024, 11:54:44 GMT

functional gradient descent, sinkhorn barycenter problem, texttt

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.66)

Add feedback

Nonparametric Teaching of Implicit Neural Representations

Zhang, Chen, Luo, Steven Tin Sui, Li, Jason Chun Lok, Wu, Yik-Chung, Wong, Ngai

arXiv.org Artificial IntelligenceMay-17-2024

We investigate the learning of implicit neural representation (INR) using an overparameterized multilayer perceptron (MLP) via a novel nonparametric teaching perspective. The latter offers an efficient example selection framework for teaching nonparametrically defined (viz. non-closed-form) target functions, such as image functions defined by 2D grids of pixels. To address the costly training of INRs, we propose a paradigm called Implicit Neural Teaching (INT) that treats INR learning as a nonparametric teaching problem, where the given signal being fitted serves as the target function. The teacher then selects signal fragments for iterative training of the MLP to achieve fast convergence. By establishing a connection between MLP evolution through parameter-based gradient descent and that of function evolution through functional gradient descent in nonparametric teaching, we show for the first time that teaching an overparameterized MLP is consistent with teaching a nonparametric learner. This new discovery readily permits a convenient drop-in of nonparametric teaching algorithms to broadly enhance INR training efficiency, demonstrating 30%+ training time savings across various input modalities.

gradient descent, nonparametric teaching, teaching, (13 more...)

arXiv.org Artificial Intelligence

2405.10531

Country:

Europe > Austria > Vienna (0.14)
Asia > China > Hong Kong (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
(2 more...)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.56)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.54)

Add feedback

Stacking as Accelerated Gradient Descent

Agarwal, Naman, Awasthi, Pranjal, Kale, Satyen, Zhao, Eric

arXiv.org Machine LearningMar-7-2024

Stacking, a heuristic technique for training deep residual networks by progressively increasing the number of layers and initializing new layers by copying parameters from older layers, has proven quite successful in improving the efficiency of training deep neural networks. In this paper, we propose a theoretical explanation for the efficacy of stacking: viz., stacking implements a form of Nesterov's accelerated gradient descent. The theory also covers simpler models such as the additive ensembles constructed in boosting methods, and provides an explanation for a similar widely-used practical heuristic for initializing the new classifier in each round of boosting. We also prove that for certain deep linear residual networks, stacking does provide accelerated training, via a new potential function analysis of the Nesterov's accelerated gradient method which allows errors in updates. We conduct proof-of-concept experiments to validate our theory as well.

gradient descent, initialization, nesterov, (13 more...)

arXiv.org Machine Learning

2403.04978

Country:

North America > United States > California > Alameda County > Berkeley (0.04)
Europe > Russia (0.04)
Asia > Russia (0.04)

Genre:

Research Report (0.50)
Instructional Material (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

MetaFun: Meta-Learning with Iterative Functional Updates

Xu, Jin, Ton, Jean-Francois, Kim, Hyunjik, Kosiorek, Adam R., Teh, Yee Whye

arXiv.org Machine LearningDec-5-2019

Few-shot supervised learning leverages experience from previous learning tasks to solve new tasks where only a few labelled examples are available. One successful line of approach to this problem is to use an encoder-decoder meta-learning pipeline, whereby labelled data in a task is encoded to produce task representation, and this representation is used to condition the decoder to make predictions on unlabelled data. We propose an approach that uses this pipeline with two important features. 1) We use infinite-dimensional functional representations of the task rather than fixed-dimensional representations. 2) We iteratively apply functional updates to the representation. We show that our approach can be interpreted as extending functional gradient descent, and delivers performance that is comparable to or outperforms previous state-of-the-art on few-shot classification benchmarks such as miniImageNet and tieredImageNet.

gradient descent, local update function, representation, (13 more...)

arXiv.org Machine Learning

1912.02738

Country:

North America > Canada > Ontario > Toronto (0.14)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.90)

Add feedback